fix(core): Sanitize lone surrogates in log body and attributes#20245
fix(core): Sanitize lone surrogates in log body and attributes#20245
Conversation
Semver Impact of This PR🟢 Patch (bug fixes) 📋 Changelog PreviewThis is how your changes will appear in the changelog. New Features ✨Core
Deps
Other
Bug Fixes 🐛Deno
Other
Internal Changes 🔧Deps
Other
🤖 This preview updates automatically when you update the PR. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Body sanitization skipped for parameterized string messages
- Replaced the primitive-only string guard with
isString(message)and added a regression test to ensurefmt/parameterizemessages with lone surrogates are sanitized in log bodies.
- Replaced the primitive-only string guard with
Or push these changes by commenting:
@cursor push 5a6da5722e
Preview (5a6da5722e)
diff --git a/packages/core/src/logs/internal.ts b/packages/core/src/logs/internal.ts
--- a/packages/core/src/logs/internal.ts
+++ b/packages/core/src/logs/internal.ts
@@ -7,7 +7,7 @@
import type { Integration } from '../types-hoist/integration';
import type { Log, SerializedLog } from '../types-hoist/log';
import { consoleSandbox, debug } from '../utils/debug-logger';
-import { isParameterizedString } from '../utils/is';
+import { isParameterizedString, isString } from '../utils/is';
import { getCombinedScopeData } from '../utils/scopeData';
import { _getSpanForScope } from '../utils/spanOnScope';
import { timestampInSeconds } from '../utils/time';
@@ -162,7 +162,7 @@
const serializedLog: SerializedLog = {
timestamp,
level,
- body: typeof message === 'string' ? _INTERNAL_removeLoneSurrogates(message) : message,
+ body: isString(message) ? _INTERNAL_removeLoneSurrogates(String(message)) : message,
trace_id: traceContext?.trace_id,
severity_number: severityNumber ?? SEVERITY_TEXT_TO_SEVERITY_NUMBER[level],
attributes: sanitizeLogAttributes({
diff --git a/packages/core/test/lib/logs/internal.test.ts b/packages/core/test/lib/logs/internal.test.ts
--- a/packages/core/test/lib/logs/internal.test.ts
+++ b/packages/core/test/lib/logs/internal.test.ts
@@ -1280,6 +1280,18 @@
expect(logBuffer?.[0]?.body).toBe('bad surrogate \uFFFD here');
});
+ it('sanitizes lone surrogates in parameterized log message body', () => {
+ const options = getDefaultTestClientOptions({ dsn: PUBLIC_DSN, enableLogs: true });
+ const client = new TestClient(options);
+ const scope = new Scope();
+ scope.setClient(client);
+
+ _INTERNAL_captureLog({ level: 'error', message: fmt`bad surrogate ${'\uD800'} here` }, scope);
+
+ const logBuffer = _INTERNAL_getLogBuffer(client);
+ expect(logBuffer?.[0]?.body).toBe('bad surrogate \uFFFD here');
+ });
+
it('sanitizes lone surrogates in log attribute values', () => {
const options = getDefaultTestClientOptions({ dsn: PUBLIC_DSN, enableLogs: true });
const client = new TestClient(options);This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
size-limit report 📦
|
node-overhead report 🧳Note: This is a synthetic benchmark with a minimal express app and does not necessarily reflect the real-world performance impact in an application.
|
| * Sanitizes serialized log attributes by replacing lone surrogates in both | ||
| * keys and string values with U+FFFD. | ||
| */ | ||
| function sanitizeLogAttributes(attributes: Attributes): Attributes { |
There was a problem hiding this comment.
m: just thinking out loud here but wdyt about doing this within serializeAttributes so that this applies to all telemetry attributes? I'm assuming the same limitation occurs for metrics and span attributes? Ideally we can keep this code path as performant as possible, so if we do it in the same loop instead of having to loop over the object again, it should be a bit faster.
There was a problem hiding this comment.
That's good feedback @Lms24 👍 I hesitated a bit to make a wider change in the scope of this PR to avoid possible side effects and limited the scope just to logs which I guess are more vulnerable to free-form user/developer text input. Looking at it again I think it makes sense to do this in serializeAttributes.
I'll be happy to revise to cover everything if it makes more sense from the Js maintainer's perspective 🙇
| * JSON-escaped form (e.g. `\uD800`). Replacing them at the SDK level ensures | ||
| * only the offending characters are lost instead of the whole payload. | ||
| */ | ||
| export function _INTERNAL_removeLoneSurrogates(str: string): string { |
There was a problem hiding this comment.
hmm given this is quite a lot of code and hence a bundle size impact, have we considered alternatives?
- can we remove prior to serde parsing in Relay?
- would it be the end of the world if we only use the
toWellFormednative method but avoid the fallback? Not sure if ReactNative supports this already.
If both options are a "No", that's fine, too. Just want to make sure we have our bases covered.
There was a problem hiding this comment.
can we remove prior to serde parsing in Relay?
Yes, handling this in Relay makes a lot more sense. The SDK side fix is probably more of an extra check if a relay fix is deployed. I'll check what would this involve.
would it be the end of the world if we only use the toWellFormed native method but avoid the fallback? Not sure if ReactNative supports this already.
I think it is supported. I'll further look into this.
There was a problem hiding this comment.
@Lms24 This is now ready for another pass 🤞
Also opened a Relay PR
b61af8f to
58c5a0e
Compare
Lone UTF-16 surrogates (U+D800–U+DFFF) in log message bodies or attribute strings cause serde_json on the server to reject the entire log batch. This replaces unpaired surrogates with U+FFFD at capture time, scoped to the logs path only. Fixes getsentry/sentry-react-native#5186 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`parameterize`/`fmt` creates a `String` object via `new String()`, so `typeof message` returns `'object'` not `'string'`, bypassing the sanitization. Use `String(message)` to coerce to a primitive before sanitizing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removes the manual charCodeAt-based fallback for surrogate sanitization to reduce bundle size impact. Uses native String.prototype.toWellFormed() which is supported in Node 20+, Chrome 111+, Safari 15.4+, Firefox 119+, and Hermes. On older runtimes the string is returned as-is. Also fixes lint errors from the `as any` cast by using bracket notation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use typed interface cast instead of bracket notation to avoid TS7015 (implicit any from index expression) in strict builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove unnecessary type assertion to fix lint. Use describe.runIf and it.runIf to skip surrogate replacement tests on runtimes without toWellFormed() (Node 18). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use Object() wrapper instead of `as unknown` cast to access isWellFormed/toWellFormed without triggering oxlint's unnecessary-assertion rule or TS strict mode errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f3146c4 to
cf4aa9d
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit cf4aa9d. Configure here.
Remove `as string` cast in sanitizeLogAttributes — TS already narrows attr.value to string after the type === 'string' check. Use Object() wrapper with Record<string, Function> typing and typeof guards in _INTERNAL_removeLoneSurrogates to avoid any type assertions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


Summary
Lone UTF-16 surrogates (U+D800–U+DFFF not part of a valid pair) in log message bodies or string attribute values/keys cause
serde_jsonon the server to reject the entire queued log batch. This means one bad log entry silently drops all healthy logs in the same flush.This PR sanitizes unpaired surrogates by replacing them with U+FFFD (replacement character) at log capture time, scoped exclusively to the logs code path (
packages/core/src/logs/internal.ts).body(plain string andfmtparameterized string messages)String.prototype.toWellFormed()when available (Node 20+, Chrome 111+, Safari 15.4+, Firefox 119+, Hermes). On older runtimes without native support, the string passes through as-is.Server-side fix in Relay: getsentry/relay#5833
Fixes getsentry/sentry-react-native#5186
Test plan
_INTERNAL_removeLoneSurrogatescovering: clean strings, empty strings, valid emoji (surrogate pairs), lone high/low surrogates, surrogates at boundaries, consecutive lone surrogates, mixed valid pairs + lone surrogates, exact repro case from issue_INTERNAL_captureLogverifying sanitization of body (plain + parameterized), attribute values, and attribute keystoWellFormed()(Node 18)🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com